Triage 27 dependabot alerts to zero + harden vLLM 0.21.0 build docs by grancier · Pull Request #3 · teleoric/rag-poc

grancier · 2026-05-24T23:06:31Z

Summary

Three related currency / hygiene updates in one PR. Each commit is independently reviewable.

Commit 1 — Triage all 27 dependabot alerts to zero

Drives the open advisory count from 27 dependabot alerts (1 critical, 8 high, 17 moderate, 1 low) down to zero; npm audit from 14 findings to 0. Five vulnerable packages collapse to a small set of fixes:

Fix	Impact
Drop `@langchain/community`	Never imported in `src/`. Removing kills the `ibm-cloud-sdk-core → axios → follow-redirects` chain (11 axios CVEs incl. prototype-pollution highs) plus transitive `ws` and a redundant `langsmith` copy.
Replace `@xenova/transformers` (2.17.2, abandoned) with `@huggingface/transformers` (4.2.0)	xenova was frozen at 2.17.2 and transitively pinned to `protobufjs@6.11.4` — the CVSS-9.8 Arbitrary Code Execution advisory plus 8 others. HF ships `onnxruntime-web@1.26.x`. One-line import swap — `pipeline()` and `FeatureExtractionPipeline` are signature-identical.
Bump direct `uuid` to `^13.0.2`	Clears the buffer-bounds advisory on `uuidv5` (hot path: chunk IDs and Qdrant point IDs).
npm `overrides` for transitive `uuid`, `langsmith`, `ws`	Holds fixes when `@langchain/core` / `openai` would otherwise pull older ranges.

Commit 2 — Bump vLLM build docs to v0.21.0

v0.19.0 → v0.21.0 introduces three breaking deltas worth flagging in vllm-setup.md:

C++20 build requirement (v0.21.0). gcc-10+ required; Ubuntu 22.04 (gcc-11) and 24.04 (gcc-13) defaults both fine.
PyTorch ≥ 2.11 (v0.20.0). Nightly ROCm index resolves automatically; wording-only change to note the floor.
HuggingFace transformers ≥ 5 (v0.20.0). Picked up transitively, no manual step.

The RDNA3 workarounds — BUILD_FA=0, --enforce-eager, ROCM_ATTN backend, no FP8 / hipBLASLt — are unchanged.

Commit 3 — Harden the doc with empirical lessons from validating v0.21.0 on real hardware

Four gaps in commit 2 surfaced when actually rebuilding against ROCm 7.2.1 / gfx1100, all now documented:

Default torch nightly index flips rocm6.4 → rocm7.2. The rocm7.2 index is live; older rocm6.4 wheels embed a HIP ABI that mismatches /opt/rocm 7.2 and produce undefined symbol: _ZN3c10* at runtime when vLLM's compiled extensions load against the wrong libtorch.
Add CMAKE_ARGS="-DHIP_FOUND=TRUE" to the build env. The rocm7.2 nightly torch's bundled LoadHIP.cmake detects HIP correctly but no longer exports the HIP_FOUND global that vLLM's CMakeLists.txt:151 checks. Without the override the build dies with "Can't find CUDA or HIP installation" despite HIP being clearly present in the configure log.
Add a python -c "import vllm._C, vllm._rocm_C" smoke immediately after the build. opt-125m alone is a false-positive sanity check — its code path uses Python fallbacks and will load + generate even with stale/broken .so files. The Llama-architecture model is the first thing that exercises kernels registered by _C, and fails at SiluAndMul.__init__ with AttributeError: '_OpNamespace' '_C' object has no attribute 'silu_and_mul' when the extensions didn't load.
Restructure §8 into two stages (opt-125m for build sanity, Llama-3.1-8B for kernel sanity) and document the harmless first-request Triton JIT-compile warnings (new v0.21.0 jit_monitor feature) and the "Cannot use ROCm custom paged attention kernel, falling back to Triton" line (fallback within ROCM_ATTN, not from it — normal on RDNA3).

Troubleshooting table gains five new rows: HIP_FOUND CMake error, torch-ABI undefined symbol, opt-125m false-pass, jit_monitor warnings, paged-attention Triton fallback.

Test plan

npm audit → 0 vulnerabilities
npx tsc --noEmit clean
vLLM v0.21.0 builds and runs against ROCm 7.2.1 / gfx1100 following the updated vllm-setup.md
make vllm brings up Llama-3.1-8B on :8000; curl /v1/models returns the model id
npm start runs the full pipeline: ingests for two tenants, returns grounded answers with parsed citations, cross-tenant probe returns "Not supported by available context."
HF transformers swap produces working embeddings — both per-tenant retrievals hit the right chunk
Idempotency: points_count stays stable on re-ingest (4 tenant-isolated points after cleanup of 2 pre-migration legacy points)

Operational notes

Post-merge migration. If you have an existing saas_docs Qdrant collection from before this PR, you'll find pre-PR points sitting alongside the new tenant-isolated ones. They're correctly excluded by the tenant filter, but they're noise. Clean with:

curl -s -X POST http://127.0.0.1:6333/collections/saas_docs/points/delete \
  -H 'Content-Type: application/json' \
  -d '{"filter": {"must": [{"is_empty": {"key": "metadata.tenantId"}}]}}'

🤖 Generated with Claude Code

Audit before: 27 dependabot alerts (1 critical, 8 high, 17 moderate, 1 low) / 14 npm audit findings across five root-cause packages. After: 0 vulnerabilities reported by either tool. Resolution path, in order of leverage: - Drop @langchain/community. It was never imported. Removing it kills the entire @langchain/community → ibm-cloud-sdk-core → axios chain (11 axios CVEs, follow-redirects header leak, and a stagehand-pulled ws instance), along with a transitive langsmith copy. - Replace @xenova/transformers (2.17.2, abandoned at this version) with @huggingface/transformers (4.2.0, the maintained successor). xenova pinned onnxruntime-web@1.14.0 which pinned onnx-proto@4.0.4 which pinned protobufjs@6.11.4 — vulnerable to a CVSS-9.8 RCE plus eight other advisories with no upstream fix coming. The HF package ships onnxruntime-web@1.26.x and modern protobuf handling, and its pipeline() / FeatureExtractionPipeline surface is signature-identical to xenova. Migration is a single import line; runtime contract for LocalTransformersEmbeddings is unchanged. - Bump direct uuid to ^13.0.2 to clear the buffer-bounds advisory on v3/v5/v6 — we use uuidv5 for both chunk IDs and Qdrant point IDs, so this is on the hot path. - Pin transitive uuid, langsmith, and ws via npm overrides so the fixes hold even when @langchain/core or openai resolves to older ranges of those packages. Verification deferred to runtime: The HF transformers swap changes the ONNX execution path. Embedding vectors should be byte-equivalent against the same Xenova/all-MiniLM-L6-v2 model but this hasn't been smoke-tested end-to-end. If output shifts, the existing Qdrant collection's points become unreachable to new queries — drop and re-ingest after merge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Currency update on top of the npm triage. vLLM 0.19.0 is two minor versions behind upstream (0.21.0 is current) and the documented build procedure was written around 0.19.0 specifically. v0.20.0 and v0.21.0 introduced three breaking deltas that affect this guide: - C++20 build requirement (v0.21.0). gcc-10+ required. Ubuntu 22.04 default gcc-11 and 24.04 default gcc-13 both satisfy this — flagged in the prerequisites and the troubleshooting table for older distros. - PyTorch 2.11 minimum (v0.20.0). The existing nightly index resolves to 2.11+ automatically; just a wording change to note the floor. - HuggingFace transformers v5 required (v0.20.0). Picked up transitively by the build; no install-step change. The RDNA3-specific workarounds (BUILD_FA=0, --enforce-eager, ROCM_ATTN backend, no FP8, no hipBLASLt, hipBLAS+TunableOp) are all unchanged — no release-notes mention of gfx1100 CUDA-graph stability fixes, so --enforce-eager stays on for now. Also bumps the ROCm minor recommendation from "7.2.x" to "7.2.2", which v0.21.0 references explicitly (#41386). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Four findings from validating the v0.21.0 doc bump against an actual ROCm 7.2.1 / gfx1100 host today, each of which would have saved an hour or so if the doc had covered it: - Flip the default torch nightly index from rocm6.4 → rocm7.2. The rocm7.2 index is now published; the older rocm6.4 wheels carry a HIP ABI that doesn't match /opt/rocm 7.2 and produce `undefined symbol: _ZN3c10*` at runtime when vLLM's compiled extensions try to load against the mismatched libtorch. - Add CMAKE_ARGS="-DHIP_FOUND=TRUE" to the build env. The rocm7.2 nightly torch's bundled LoadHIP.cmake detects HIP correctly and prints every ROCm library version but no longer exports the HIP_FOUND global variable that vLLM's CMakeLists.txt:151 checks. Without the override the build dies with "Can't find CUDA or HIP installation" despite HIP being clearly present in the configure log. - Add a `python -c "import vllm._C, vllm._rocm_C"` smoke immediately after the build. Necessary because opt-125m's code path uses Python fallbacks and will load + generate even with stale/broken .so files — making the existing smoke test a false-positive on a real ABI problem. The Llama-architecture model is the first thing that exercises kernels registered by _C. - Restructure §8 into two stages: opt-125m for build sanity, then Llama-3.1-8B for kernel sanity. Update the "expected log lines" prefix to the new v0.21.0 TURBOQUANT-rejection wording, document the harmless first-request Triton JIT-compile warnings introduced by the new jit_monitor, and flag the "Cannot use ROCm custom paged attention kernel, falling back to Triton" line as expected on RDNA3 (it's a fallback *within* ROCM_ATTN, not a fallback *from* it). Troubleshooting table gains rows for: the HIP_FOUND CMake error, the torch-ABI undefined-symbol case, the opt-125m false-pass scenario, the JIT-monitor warnings, and the paged-attention Triton fallback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

grancier and others added 2 commits May 24, 2026 19:05

grancier changed the title ~~Triage all 27 dependabot alerts to zero~~ Triage 27 dependabot alerts to zero + bump vLLM build docs to v0.21.0 May 25, 2026

grancier changed the title ~~Triage 27 dependabot alerts to zero + bump vLLM build docs to v0.21.0~~ Triage 27 dependabot alerts to zero + harden vLLM 0.21.0 build docs May 25, 2026

grancier merged commit 65c4473 into master May 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Triage 27 dependabot alerts to zero + harden vLLM 0.21.0 build docs#3

Triage 27 dependabot alerts to zero + harden vLLM 0.21.0 build docs#3
grancier merged 3 commits into
masterfrom
chore/dependabot-triage

grancier commented May 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

grancier commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Commit 1 — Triage all 27 dependabot alerts to zero

Commit 2 — Bump vLLM build docs to v0.21.0

Commit 3 — Harden the doc with empirical lessons from validating v0.21.0 on real hardware

Test plan

Operational notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

grancier commented May 24, 2026 •

edited

Loading